DOCS-4086: code samples for building a good dataset #4413

nathan-contino · 2025-06-25T21:07:26Z

adds SDK code samples for Python, Go, Typescript, and Flutter for each step of dataset creation
split across multiple pages because create a dataset page was already too long
split leverage ai into two sections because it's already too full (and this guide is part of an effort to break it up anyway)

Most non-SDK content is repurposed from the existing 'create a dataset' page.

Apologies for the large line-changed count; hard to avoid when you're splitting up pages and creating examples across multiple languages.

netlify · 2025-06-25T21:07:31Z

✅ Deploy Preview for viam-docs ready!

Name	Link
🔨 Latest commit	`1873f7f`
🔍 Latest deploy log	https://app.netlify.com/projects/viam-docs/deploys/68641389f5c51e00088db8d9
😎 Deploy Preview	https://deploy-preview-4413--viam-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
Lighthouse	1 paths audited Performance: 40 (🔴 down 19 from production) Accessibility: 100 (no change from production) Best Practices: 100 (no change from production) SEO: 92 (no change from production) PWA: 70 (no change from production) View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

JessamyT

Some of my comments are probably unintentionally out-of-diff on content that was repurposed; apologies. Did not review code in any detail since I assume that's all tested. Also GitHub seems to be bugging so will end review now lest comments not actually show up. LMK if I should provide code review (to the extent that I'm qualified to do so :D )

docs/data-ai/train/create-dataset.md

docs/data-ai/train/capture-images.md

docs/data-ai/train/train-tflite.md

docs/data-ai/train/update-dataset.md

docs/data-ai/train/create-dataset.md

Co-authored-by: Jessamy Taylor <[email protected]>

npentrel

Some mostly minor feedback - overall great direction.
Only bigger feedback I have is that this does create quite a lot of new pages. I am not entirely convinced we need quite that many. Creating a dataset, for example, is fairly short, should that just be an include and part of some of the other pages? Will need to think more about that.

docs/data-ai/train/create-dataset.md

docs/data-ai/train/capture-images.md

docs/data-ai/train/update-dataset.md

docs/data-ai/train/annotate-images.md

JessamyT

The flow is hard because there are so many ways to do each step, and some of them can happen in any order (e.g. add to dataset then annotate or vice versa?), not just one linear path. The true flow chart is a pile of strands of spaghetti that each fork off into multiple ends. So I guess this is a plausible way and I don't currently have a better suggestion for how to present the path(s).

Noticed a couple more things; commented. Generally not blocking except maybe get image vs get images discrepancy in code samples?

docs/data-ai/train/update-dataset.md

docs/data-ai/train/train-tflite.md

docs/data-ai/train/capture-images.md

docs/data-ai/train/create-dataset.md

docs/data-ai/train/capture-images.md

Co-authored-by: Jessamy Taylor <[email protected]>

docs/data-ai/_index.md

Co-authored-by: Naomi Pentrel <[email protected]>

docs/data-ai/_index.md

docs/data-ai/train/train-tflite.md

npentrel · 2025-06-30T15:50:45Z

docs/data-ai/train/create-dataset.md

I don't love the cards there because it goes against the flow that the "next"-buttons try to suggest. A sentence with links to the steps is less confusing I think because there's fewer boxes.

I have an alternate suggestion, why don't we do:

Capture and annotate images

Create a training dataset (which includes adding to a training dataset)

capture and annotate could be separate but I feel like that might make create a training set less awkward?

docs/data-ai/train/update-dataset.md

npentrel · 2025-06-30T15:59:28Z

docs/data-ai/train/update-dataset.md

+{{% /tab %}}
+{{< /tabs >}}
+
+## Capture, annotate, and add images to a dataset


This is still very odd to me. Like it essentially does both capture + annotate (which is the next page) and adding to dataset when we've split those across three pages. Either they're together and we have them in one page or this doesn't make sense

this snippet was inspired by the wine-pouring demo example linked to me as a good pattern, so i wanted to find a place for it. if you feel it doesn't fit into the flow of the pages as-is, would you rather i:

removed this example entirely?

rearranged the pages, perhaps combining add and annotate?

Reorganized according to your other comment; hopefully that helps!

yep this lgtm. I think keeping is good thought maintenance will be painful so future us might disagree

noting to self that we can significantly simplify this code when uploadfiletodataset happens

npentrel · 2025-06-30T16:02:33Z

docs/data-ai/train/annotate-images.md

+
+## Classify images with tags
+
+Classification determines a descriptive tag or set of tags for an image.


an image here might be great and more immediatly convey this but that doesn't need to necessarily happen with this PR

I'll brainstorm some ideas for this and submit a request

npentrel · 2025-06-30T16:05:03Z

docs/data-ai/train/annotate-images.md

+
+{{< alert title="Tip" color="tip" >}}
+
+Unless you already have an ML model that can generate tags for your dataset, use the Web UI to annotate.


this is confusing. Because it's sort of begs the question - so if I do have a model, what then? So it should link to that code. which maybe means that this should go to the annotate page: https://deploy-preview-4413--viam-docs.netlify.app/data-ai/train/update-dataset/#capture-annotate-and-add-images-to-a-dataset

I think I misunderstand you, isn't this already on the annotate page? I put this admonition here to help guide people into the appropriate tab in the tabset that follows. Where are you thinking we could link?

Regardless I reworded this away from the question-begging 'unless', but me know if I'm missing something else.

docs/data-ai/train/capture-images.md

Co-authored-by: Naomi Pentrel <[email protected]>

docs/data-ai/train/create-dataset.md

JessamyT

Sorry but latest changes are also a bit confusing....suggested a possible product change 😬

docs/data-ai/train/create-dataset.md

JessamyT · 2025-06-30T21:49:40Z

docs/data-ai/train/create-dataset.md

Right now there is still capture and add to dataset content on two different pages which feels confusing.
It's hard to force a linear flow here since with all the different options for each step, the same order doesn't always apply:

In most cases, like using the mobile app, uploading a batch, or adding existing images, you can get the data first and then create a dataset and add the data to it. Order doesn't matter, but it's easiest conceptually to think of getting data and then making a dataset with it.

One exception: If you capture individual images through the UI and add them to a dataset on the spot, you have to have a dataset to add them to before you start capturing.

In the script version of that same "Capture individual images" heading, it looks like you don't specify a dataset id, so you'd still have to add to a dataset later like with the other methods.

Suggestion:

Implement Naomi's order suggestion, and:

Get rid of the exception: Ask eng to change that capture button to not save to dataset but rather just save the image to your captured data. Just one single click, so you can capture more images in rapid succession, then add a batch all at once later. This would be a less clunky UX IMO, and also solve a docs flow problem.

If this will take a while but they'll do it, don't worry about this flow for now; document the rest per Naomi's order

If this can't/won't ever be changed in eng, document this as a thing you can do but shape the docs around the normal capture-then-make-a-dataset order

Co-authored-by: Jessamy Taylor <[email protected]>

docs/data-ai/train/create-dataset.md

docs/data-ai/train/capture-annotate-images.md

Co-authored-by: Naomi Pentrel <[email protected]>

viambot · 2025-07-01T16:58:18Z

It looks like the following files may have been renamed. Please ensure you set all needed aliases:
rename docs/data-ai/{ai/advanced => train}/_index.md (37%) rename docs/data-ai/{ai => train}/train-tflite.md (86%) rename docs/data-ai/{ai => train}/train.md (99%) rename docs/data-ai/{ai/advanced => train}/upload-external-data.md (95%)

github-actions · 2025-07-01T17:01:09Z

🔎💬 Inkeep AI search and chat service is syncing content for source 'Viam Docs'

DOCS-4086: code samples for building a good dataset

fcdac53

viambot added the safe to build This pull request is marked safe to build from a trusted zone label Jun 25, 2025

nathan-contino added 6 commits June 25, 2025 17:08

Lint code samples

ce48bc1

Fix broken link

a67c831

Fix missing path, elongate description

24275fd

Fix most markdownlint complaints

7ee9e5f

Merge branch 'main' into DOCS-4086-build-good-dataset

09ba15a

Fix more linter errors

fc6cb9c

JessamyT reviewed Jun 26, 2025

View reviewed changes

npentrel reviewed Jun 26, 2025

View reviewed changes

docs/data-ai/train/create-dataset.md Outdated Show resolved Hide resolved

Apply suggestions from code review

a4e38dc

Co-authored-by: Jessamy Taylor <[email protected]>

npentrel reviewed Jun 26, 2025

View reviewed changes

nathan-contino added 8 commits June 26, 2025 09:23

Implement additional docs feedback

6a5f3fe

add more links

9fdd043

Fix lint errors, update capture examples

2cfdde6

remove duplicate blank lines

332949c

Fix prettier issues

bebdd3d

fix link

377d432

Make it easier to reach subsequent guide pages from create dataset page

c7276df

Slight rewords to address remaining feedback on annotation SDK snippets

f8e3f60

JessamyT reviewed Jun 27, 2025

View reviewed changes

docs/data-ai/train/create-dataset.md Outdated Show resolved Hide resolved

JessamyT reviewed Jun 28, 2025

View reviewed changes

docs/data-ai/train/create-dataset.md Outdated Show resolved Hide resolved

nathan-contino commented Jun 30, 2025

View reviewed changes

docs/data-ai/train/capture-images.md Outdated Show resolved Hide resolved

nathan-contino commented Jun 30, 2025

View reviewed changes

docs/data-ai/train/capture-images.md Outdated Show resolved Hide resolved

nathan-contino commented Jun 30, 2025

View reviewed changes

docs/data-ai/train/capture-images.md Outdated Show resolved Hide resolved

Apply suggestions from code review

f6022f8

Co-authored-by: Jessamy Taylor <[email protected]>

npentrel reviewed Jun 30, 2025

View reviewed changes

docs/data-ai/_index.md Outdated Show resolved Hide resolved

Update docs/data-ai/_index.md

4205f6a

Co-authored-by: Naomi Pentrel <[email protected]>

npentrel reviewed Jun 30, 2025

View reviewed changes

nathan-contino and others added 3 commits June 30, 2025 14:19

Update docs/data-ai/_index.md

a9c56a7

Co-authored-by: Naomi Pentrel <[email protected]>

Implement naomi feedback

d26e7c0

Fix consecutive blank lines

dfceaf7

JessamyT reviewed Jun 30, 2025

View reviewed changes

docs/data-ai/train/create-dataset.md Outdated Show resolved Hide resolved

JessamyT reviewed Jun 30, 2025

View reviewed changes

Apply suggestions from code review

c6ff8ae

Co-authored-by: Jessamy Taylor <[email protected]>

nathan-contino commented Jul 1, 2025

View reviewed changes

docs/data-ai/train/create-dataset.md Outdated Show resolved Hide resolved

nathan-contino added 4 commits July 1, 2025 09:23

Update docs/data-ai/train/create-dataset.md

9a222f8

Move annotate single step to annotate page

046d6ed

Delete duplicate newline

b40c713

Add newline

6450cb3

npentrel reviewed Jul 1, 2025

View reviewed changes

docs/data-ai/train/capture-annotate-images.md Outdated Show resolved Hide resolved

Update docs/data-ai/train/capture-annotate-images.md

9f3126c

Co-authored-by: Naomi Pentrel <[email protected]>

npentrel approved these changes Jul 1, 2025

View reviewed changes

nathan-contino added 3 commits July 1, 2025 12:39

Fix typo

7e864d2

Clean up tag verbiage

b9d08b4

Verbiage improvement

1873f7f

nathan-contino merged commit 06c1acb into viamrobotics:main Jul 1, 2025
12 checks passed

nathan-contino deleted the DOCS-4086-build-good-dataset branch July 1, 2025 17:00


		## Classify images with tags

		Classification determines a descriptive tag or set of tags for an image.


		{{< alert title="Tip" color="tip" >}}

		Unless you already have an ML model that can generate tags for your dataset, use the Web UI to annotate.

DOCS-4086: code samples for building a good dataset #4413

DOCS-4086: code samples for building a good dataset #4413

Uh oh!

Conversation

nathan-contino commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for viam-docs ready!

Uh oh!

JessamyT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

npentrel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JessamyT left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nathan-contino commented Jun 25, 2025 •

edited

Loading

netlify bot commented Jun 25, 2025 •

edited

Loading